Using Tidycensus to Make Maps in R: A Walkthrough

Author

Riya Sharma

First things first, let’s go ahead and reach into our library and load all necessary packages, namely tidycensus.

Getting Started

tidycensus is an extremely useful package if you’re someone who likes to work with US Census data. If you’ve ever explored the census.gov website, you’ll know the their downloadable datasets are not exactly readable, especially in R.

Tidycensus allows you to search for variables, build clean datasets, and even use the information to make great visualizations, like maps! Today, our task is to visualize income inequality between Native Americans and White Americans in the US. To do so, we’ll analyze median household income across the 50 states and create a map (using the mapview package) displaying this variable.

First, you’ll need to install and/or load your Census API key. You can learn how to do so here.

Loading Variables for Analysis: Native Americans

Now, it’s time to explore the different variables you can use in your analysis. To do so, you’ll use the load_variables() function and input the year and type of Census survey. We’ll be pulling from the 2021 ACS5 in this example. This is a big dataset, so I find it’s easier to use View() than head() here.

Code
acs = load_variables(2021, "acs5", cache = TRUE)

head(acs)
# A tibble: 6 × 4
  name        label                                   concept            geogr…¹
  <chr>       <chr>                                   <chr>              <chr>  
1 B01001A_001 Estimate!!Total:                        SEX BY AGE (WHITE… tract  
2 B01001A_002 Estimate!!Total:!!Male:                 SEX BY AGE (WHITE… tract  
3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years  SEX BY AGE (WHITE… tract  
4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years   SEX BY AGE (WHITE… tract  
5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE (WHITE… tract  
6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE (WHITE… tract  
# … with abbreviated variable name ¹​geography

For now, I want to focus on Native Americans’ income data, so I’ll use the following variables: median household income for Native Americans (B19013C_001) and aggregate household income for Native Americans (B19025C_001).

I’m combining these in a vector and assigning them to a variable called acs_vars. Within the vector, I gave the variables more readable names for analysis purposes.

Code
acs_vars = c(median_income = "B19013C_001",
             aggregate_income = "B19025C_001")

Creating a Dataframe

Now, because we are going to make maps to visualize income inequality across the US, we’ll need to create a dataset that breaks median household income down by state and includes the shapefile info needed to generate a map in mapview.

We do this by once again using the get_acs() function, whose arguments include geography, variables, output, and geometry. Here’s a breakdown of what these mean:

  • geography = “state”: we pull data at the state-level
  • variables = c(acs_vars): we use the variables we pulled previously (median_income, aggregate_income) in our dataset
  • output = “wide”: This makes data easier to read by pivoting wide
  • geometry = TRUE: This includes all shapefile data necessary to make a map
Code
# pull for US states
us_native_income <- get_acs(geography = "state",
                               variables = c(acs_vars),
                               output = "wide",
                               geometry = TRUE)
Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |======================================================================| 100%
Code
head(us_native_income)
Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: 33.00429 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
  GEOID         NAME median_incomeE median_incomeM aggregate_incomeE
1    56      Wyoming          54363           6016         238400300
2    02       Alaska          53061           2118        2062309500
3    24     Maryland          76025           7661         535591000
4    05     Arkansas          43386           2920         345330000
5    38 North Dakota          40489           4323         645661900
6    10     Delaware          51661           5403          95905400
  aggregate_incomeM                       geometry
1          39202023 MULTIPOLYGON (((-111.0546 4...
2          85529516 MULTIPOLYGON (((179.4825 51...
3          58775225 MULTIPOLYGON (((-76.05015 3...
4          33235631 MULTIPOLYGON (((-94.61792 3...
5          48678105 MULTIPOLYGON (((-104.0487 4...
6          23654496 MULTIPOLYGON (((-75.56555 3...

Cleaning Data: Native Americans

Looking at the first few rows, we can see if there are rows for Native Americans’ estimated median household income and the margin of error by seeing if there’s a trailing E or M. Since we are only concerned with the estimates, we’ll remove the margin of error columns, AKA any column ending with “M”.

Code
us_native_income <- us_native_income %>%
  select(-ends_with("M"))

head(us_native_income)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: 33.00429 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
  GEOID         NAME median_incomeE aggregate_incomeE
1    56      Wyoming          54363         238400300
2    02       Alaska          53061        2062309500
3    24     Maryland          76025         535591000
4    05     Arkansas          43386         345330000
5    38 North Dakota          40489         645661900
6    10     Delaware          51661          95905400
                        geometry
1 MULTIPOLYGON (((-111.0546 4...
2 MULTIPOLYGON (((179.4825 51...
3 MULTIPOLYGON (((-76.05015 3...
4 MULTIPOLYGON (((-94.61792 3...
5 MULTIPOLYGON (((-104.0487 4...
6 MULTIPOLYGON (((-75.56555 3...

Next, we’ll remove the trailing “E” by using the sub() function. The $ in the code means we are reformatting the end of the string only.

Code
colnames(us_native_income) <- sub("E$", "", colnames(us_native_income)) 

head(us_native_income)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: 33.00429 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
  GEOID          NAM median_income aggregate_income
1    56      Wyoming         54363        238400300
2    02       Alaska         53061       2062309500
3    24     Maryland         76025        535591000
4    05     Arkansas         43386        345330000
5    38 North Dakota         40489        645661900
6    10     Delaware         51661         95905400
                        geometry
1 MULTIPOLYGON (((-111.0546 4...
2 MULTIPOLYGON (((179.4825 51...
3 MULTIPOLYGON (((-76.05015 3...
4 MULTIPOLYGON (((-94.61792 3...
5 MULTIPOLYGON (((-104.0487 4...
6 MULTIPOLYGON (((-75.56555 3...

For cosmetic reasons, we’ll place a $ in front of the median income estimates with the paste() function. I’m making this a new variable because, as we’ll see later, it’ll only be used to present information.

Code
us_native_income$median_income_signed <-paste("$",us_native_income$median_income)

Making a Map: Native Americans’ Median Household Income in the US

Now for the exciting part: mapmaking! There are a couple components that we’ll use to make this map look very nice. One is a popup. Popups show up when you click on specific states on the map. We want to display the state name and Native Americans’ median household income in that state, so we’ll use the glue package to do the following:

Code
mylabel <- glue::glue("<strong>{us_native_income$NAM}</strong><br />
                      Median Native American Household Income: {us_native_income$median_income_signed}") %>% 
  lapply(htmltools::HTML)

# NOTE: this function utilizes HTML syntax. For now, you just need to know to include the dataset$variable name you want to pull from. NAM is the state's name, and median_income_signed is the median income with the dollar sign.

Now for our actual map! We’ll use mapview().

  • The first argument is our dataframe
  • zcol = “median_income”, which is the column we are pulling from.
  • at = seq() sets the legend between 20,000 dollars and 120,000 dollars. I am manually setting a sequence because I want to compare this map with White Americans’ median income, and that would be difficult if both maps aren’t on the same scale.
  • col.regions = RColorBrewer::brewer.pal(9, “PuBuGn”) allows me to use an RColorBrewer palette to color my map.
  • popup = mylabel allows me to use the label I mentioned previously
Code
us_native_map = mapview(us_native_income, zcol = "median_income",
                        at = seq(20000, 120000, 15000),
        col.regions = RColorBrewer::brewer.pal(9, "PuBuGn"), 
        alpha.regions = 1,
        popup = mylabel)

us_native_map

Voila! Map made. Now, we’ll do the same for White Americans and compare median household incomes.

Loading Variables: White Americans

Same process as before: creating a vector and adding median household income and aggregate income variables for White Americans.

Code
acs_vars2 = c(median_income = "B19013A_001",
              aggregate_income = "B19025A_001")

Creating a Dataframe

Pulling for all US states, like before:

Code
us_white_income <- get_acs(geography = "state",
                               variables = c(acs_vars2),
                               output = "wide",
                               geometry = TRUE)
Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Code
head(us_white_income)
Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: 33.00429 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
  GEOID         NAME median_incomeE median_incomeM aggregate_incomeE
1    56      Wyoming          68965           1127       18759572300
2    02       Alaska          87920           1282       20543389900
3    24     Maryland         101579            715      175541416900
4    05     Arkansas          55985            556       70014969200
5    38 North Dakota          71755           1080       25774144700
6    10     Delaware          78491           1224       27498115400
  aggregate_incomeM                       geometry
1         359127001 MULTIPOLYGON (((-111.0546 4...
2         284978409 MULTIPOLYGON (((179.4825 51...
3        1250297569 MULTIPOLYGON (((-76.05015 3...
4         917428187 MULTIPOLYGON (((-94.61792 3...
5         372805071 MULTIPOLYGON (((-104.0487 4...
6         418809034 MULTIPOLYGON (((-75.56555 3...

Cleaning Data: White Americans

Removing margin of error columns:

Code
us_white_income <- us_white_income %>%
  select(-ends_with("M"))

head(us_white_income)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: 33.00429 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
  GEOID         NAME median_incomeE aggregate_incomeE
1    56      Wyoming          68965       18759572300
2    02       Alaska          87920       20543389900
3    24     Maryland         101579      175541416900
4    05     Arkansas          55985       70014969200
5    38 North Dakota          71755       25774144700
6    10     Delaware          78491       27498115400
                        geometry
1 MULTIPOLYGON (((-111.0546 4...
2 MULTIPOLYGON (((179.4825 51...
3 MULTIPOLYGON (((-76.05015 3...
4 MULTIPOLYGON (((-94.61792 3...
5 MULTIPOLYGON (((-104.0487 4...
6 MULTIPOLYGON (((-75.56555 3...

Removing the trailing “E”:

Code
colnames(us_white_income) <- sub("E$", "", colnames(us_white_income)) # $ means end of string only

head(us_white_income)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: 33.00429 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
  GEOID          NAM median_income aggregate_income
1    56      Wyoming         68965      18759572300
2    02       Alaska         87920      20543389900
3    24     Maryland        101579     175541416900
4    05     Arkansas         55985      70014969200
5    38 North Dakota         71755      25774144700
6    10     Delaware         78491      27498115400
                        geometry
1 MULTIPOLYGON (((-111.0546 4...
2 MULTIPOLYGON (((179.4825 51...
3 MULTIPOLYGON (((-76.05015 3...
4 MULTIPOLYGON (((-94.61792 3...
5 MULTIPOLYGON (((-104.0487 4...
6 MULTIPOLYGON (((-75.56555 3...

Add dollar signs in front of income values:

Code
us_white_income$median_income_signed <-paste("$",us_white_income$median_income)

Making a Map: White Americans

Making a popup to show state name and median household income:

Code
mylabel2 <- glue::glue("<strong>{us_white_income$NAM}</strong><br />
                      Median White American Household Income: {us_white_income$median_income_signed}") %>% 
  lapply(htmltools::HTML)

Making a map!

Code
us_white_map = mapview(us_white_income, zcol = "median_income",
                       at = seq(20000, 120000, 15000),
        col.regions = RColorBrewer::brewer.pal(9, "PuBuGn"), 
        alpha.regions = 1,
        popup = mylabel2)

us_white_map

Comparing Income Inequality with Both Maps

Right off the bat, we can see that Native Americans do not make as much as White Americans. By setting the maps to the same scales, we are able to tell as much from the colors alone. No state shows Native Americans having a median household income of $80,000 or above.

While these maps are already proving helpful in discovering income inequality between these two groups, we might benefit more by seeing these maps side-by-side. To do so, we’ll use the sync() function:

Code
#sync(us_white_map, us_native_map)

Conclusions

Now we can really see how White households in states like Washington, Minnesota, and Colorado (in addition to others) often make well over 80,000 dollars per year, while most Native American households don’t bring in more than 65,000 dollars per year. Even in Alaska, the state with the largest proportion of Natives, we see White Americans bringing in a median household income of 87,920 dollars, while Native Alaskans bring in a median household income of 53,061 dollars. In no state to Native Americans boast a higher median income than White Americans, likely due to systemic racism that has disadvantages Natives since colonists first settled in the US. This map highlights how more work needs to be done to educate and empower Native communities so they are given the resources needed to succeed financially.